Abstract: This for many computer innovation and machine learning problems, key of good performance is large data set. However, in many computer innovation and machine learning algorithms consist of ?nding nearest neighbour matches in large data set is computationally expansive part. New algorithms are developed for approximate nearest neighbour matching and evaluation and then compare them with preceding algorithms. For matching high dimensional features most efficient algorithms are essential. Perhaps the Locality sensitive hashing (LSH) technique is best known hashing based nearest neighbour technique which requires multiple numbers of hash functions with the property that the hashes of elements that are close to each other are also likely to be close. Variants of LSH such as multi-probe LSH improves the high storage costs by reducing the number of hash tables, and LSH Forest adapts better to the data without requiring hand tuning of parameters. for ?nding the best algorithm to search a particular data set, Optimal nearest neighbour algorithm and its parameters depend on the large data set characteristics and gives description of automated con?guration procedure. In order to scale to very large data sets that would otherwise not ?t in the memory. When dealing with such large data, possible solutions include performing some dimensionality reduction on the data, keeping the data on the disk and loading only parts of it in the main memory or distributing the data on several computers and using a distributed nearest neighbour search algorithm.
Keywords: Nearest Neighbour Search, Big Data, Approximate Search.